Performance Comparison of Time Series Data Using Predictive Data Mining Techniques
ثبت نشده
چکیده
This paper focuses on the methodology used in applying the Time Series Data Mining techniques to financial time series data for calculating currency exchange rates of US dollars to Indian Rupees. Four Models namely Multiple Regression in Excel, Multiple Linear Regression of Dedicated Time Series Analysis in Weka, Vector Autoregressive Model in R and Neural Network Model using NeuralWorks Predict are analyzed. All the models are compared on the basis of the forecasting errors generated by them. Mean Error (ME), Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Square Error (RMSE), Mean Percentage Error (MPE) and Mean Absolute Percentage Error (MAPE) are used as a forecast accuracy measure. Results show that all the models accurately predict the exchange rates, but Multiple Linear Regression of Dedicated Time Series Analysis in Weka outperforms the other three models. KeywordsExchange Rate Prediction, Time Series Models, Regression, Predictive Data Mining, Weka, VAR, Neural Network Advances in Information Mining ISSN: 0975-3265 & E-ISSN: 0975-9093, Volume 4, Issue 1, 2012 Introduction One of the most enticing application areas of data mining in these emerging technologies is in finance, becoming more amenable to data-driven modeling as large sets of financial data become available. In the field of finance the extensive use of data mining applications includes the area of forecasting stock market, pricing of corporate bonds, understanding and managing financial risk, trading futures, prediction of exchange rates, credit rating etc. Monthly data is collected for the last 10 years from 2000 to 2010, for predicting exchange rates of 2011 [5,14,16].The original rate of 2011 is available and then compared with the predicted values for calculating the accuracy of the models. Mean Error (ME), Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Square Error (RMSE), Mean Percentage Error (MPE) and Mean Absolute Percentage Error (MAPE) is used as a forecast accuracy measure. The multiple variables used on which exchange rate depends are CPI, Trade Balance (in million US dollars), GDP, Unemployment and Monetary Base (in billion dollars) [16]. Four Models namely Multiple Linear Regression in Excel [14], Multiple Linear Regression of Dedicated Time Series Analysis in Weka [6, 9], Vector Autoregressive Model in R [6-8, 10] and Neural Network Model [4,5,13-15] using NeuralWorks Predict are analyzed. All the models are compared on the basis of the forecasting errors generated by them. The paper is organized as follows. Section II covers predictive data mining. Section III covers the four predictive time series models namely Multiple Linear Regression in Excel, Multiple Linear Regression of Dedicated Time Series Analysis in Weka, Vector Autoregressive Model in R and Neural Network Model using NeuralWorks Predict. Section IV covers the datasets used for the analysis and the steps and results obtained by using the four models. Section V shows the performance comparison of the four models using Mean Error (ME), Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Square Error (RMSE), Mean Percentage Error (MPE) and Mean Absolute Percentage Error (MAPE). Section VI concludes the work, followed by references in the last section. Predictive Data Mining Predictive data mining analyzes data in order to construct one or a set of models and attempts to predict the behavior of new data sets. Prediction is a form of data analysis that can be used to extract models describing important data classes or to predict future data trends. Such analysis can help provide us with a better understanding of the data at large. Prediction can also be viewed as a mapping or function, y = f (X), where X is the input (e.g., a tuple describing a loan applicant), and the output y is a continuous or ordered value (such as the predicted amount that the bank can safely loan the applicant); That is, we wish to learn a mapping or function that models the relationship between X and y. There are two issues regarding prediction: first is preparing the data for prediction which involves the preprocessing steps like data cleaning, relevance analysis, data transformation and data reduction, second issue is comparing the different prediction models. The models are compared according to the criteria given below: Citation: Saigal S. and Mehrotra D. (2012) Performance Comparison of Time Series Data Using Predictive Data Mining Techniques. Advances in Information Mining, ISSN: 0975-3265 & E-ISSN: 0975-9093, Volume 4, Issue 1, pp.-57-66. Copyright: Copyright©2012 Saigal S. and Mehrotra D. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.
منابع مشابه
Forecasting Gold Price using Data Mining Techniques by Considering New Factors
Gold price forecast is of great importance. Many models were presented by researchers to forecast gold price. It seems that although different models could forecast gold price under different conditions, the new factors affecting gold price forecast have a significant importance and effect on the increase of forecast accuracy. In this paper, different factors were studied in comparison to the p...
متن کاملA Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach
In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...
متن کاملFuzzy clustering of time series data: A particle swarm optimization approach
With rapid development in information gathering technologies and access to large amounts of data, we always require methods for data analyzing and extracting useful information from large raw dataset and data mining is an important method for solving this problem. Clustering analysis as the most commonly used function of data mining, has attracted many researchers in computer science. Because o...
متن کاملUsing Combined Descriptive and Predictive Methods of Data Mining for Coronary Artery Disease Prediction: a Case Study Approach
Heart disease is one of the major causes of morbidity in the world. Currently, large proportions of healthcare data are not processed properly, thus, failing to be effectively used for decision making purposes. The risk of heart disease may be predicted via investigation of heart disease risk factors coupled with data mining knowledge. This paper presents a model developed using combined descri...
متن کاملAn Empirical Comparison of Distance Measures for Multivariate Time Series Clustering
Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...
متن کامل